scikit-learn: make_multilabel_classification
https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_multilabel_classification.html
Generate a random multilabel classification problem.
引数
n_samples
n_features
n_classes
n_labels
The average number of labels per instance
More precisely, the number of labels per sample is drawn from a Poisson distribution with n_labels as its expected value, but samples are bounded (using rejection sampling) by n_classes,
「サンプルごとのラベルの数はn_labelsのポアソン分布から期待値として取られる」
「しかしサンプルは(rejection samplingを使って)n_classesに束縛される」
and must be nonzero if allow_unlabeled is False.
「allow_unlabeled引数がFalseなら、nonzeroでなければならない」
allow_unlabeled
default=True
If True, some instances might not belong to any class.
「いくつかのサンプルはどのクラスにも属さない」
code:example.py
>> X, Y = make_multilabel_classification(random_state=1)
>> X0 # integerになっている
array([6., 0., 3., 5., 4., 1., 1., 0., 0., 0., 1., 0., 6., 0., 3., 0., 4.,
2., 2., 4.])
>> Y0
array(0, 0, 0, 0, 1)
>> X1
array([3., 0., 4., 4., 2., 4., 1., 1., 3., 0., 5., 2., 5., 3., 3., 3., 1.,
3., 6., 5.])
>> Y1
array(0, 1, 0, 0, 1)
>> counter = Counter(idx for labels in Y for idx, label in enumerate(labels) if label == 1) # TODO np.bincountの使用を検討
>> counter.most_common() # 0はclass_0、1はclass_1、2はclass_2、・・・を表す
(1, 61), (0, 54), (3, 42), (4, 26), (2, 5)